23 research outputs found

    PocketMatch: A new algorithm to compare binding sites in protein structures

    Get PDF
    Background: Recognizing similarities and deriving relationships among protein molecules is a fundamental
requirement in present-day biology. Similarities can be present at various levels which can be detected through comparison of protein sequences or their structural folds. In some cases similarities obscure at these levels could be present merely in the substructures at their binding sites. Inferring functional similarities between protein molecules by comparing their binding sites is still largely exploratory and not as yet a routine protocol. One of
the main reasons for this is the limitation in the choice of appropriate analytical tools that can compare binding sites with high sensitivity. To benefit from the enormous amount of structural data that is being rapidly accumulated, it is essential to have high throughput tools that enable large scale binding site comparison.

Results: Here we present a new algorithm PocketMatch for comparison of binding sites in a frame invariant
manner. Each binding site is represented by 90 lists of sorted distances capturing shape and chemical nature of the site. The sorted arrays are then aligned using an incremental alignment method and scored to obtain PMScores for pairs of sites. A comprehensive sensitivity analysis and an extensive validation of the algorithm have been carried out. Perturbation studies where the geometry of a given site was retained but the residue types were changed randomly, indicated that chance similarities were virtually non-existent. Our analysis also demonstrates that shape information alone is insufficient to discriminate between diverse binding sites, unless
combined with chemical nature of amino acids.

Conclusions: A new algorithm has been developed to compare binding sites in accurate, efficient and
high-throughput manner. Though the representation used is conceptually simplistic, we demonstrate that along
with the new alignment strategy used, it is sufficient to enable binding comparison with high sensitivity. Novel methodology has also been presented for validating the algorithm for accuracy and sensitivity with respect to geometry and chemical nature of the site. The method is also fast and takes about 1/250th second for one comparison on a single processor. A parallel version on BlueGene has also been implemented

    targetTB: A target identification pipeline for Mycobacterium tuberculosis through an interactome, reactome and genome-scale structural analysis

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Tuberculosis still remains one of the largest killer infectious diseases, warranting the identification of newer targets and drugs. Identification and validation of appropriate targets for designing drugs are critical steps in drug discovery, which are at present major bottle-necks. A majority of drugs in current clinical use for many diseases have been designed without the knowledge of the targets, perhaps because standard methodologies to identify such targets in a high-throughput fashion do not really exist. With different kinds of 'omics' data that are now available, computational approaches can be powerful means of obtaining short-lists of possible targets for further experimental validation.</p> <p>Results</p> <p>We report a comprehensive <it>in silico </it>target identification pipeline, targetTB, for <it>Mycobacterium tuberculosis</it>. The pipeline incorporates a network analysis of the protein-protein interactome, a flux balance analysis of the reactome, experimentally derived phenotype essentiality data, sequence analyses and a structural assessment of targetability, using novel algorithms recently developed by us. Using flux balance analysis and network analysis, proteins critical for survival of <it>M. tuberculosis </it>are first identified, followed by comparative genomics with the host, finally incorporating a novel structural analysis of the binding sites to assess the feasibility of a protein as a target. Further analyses include correlation with expression data and non-similarity to gut flora proteins as well as 'anti-targets' in the host, leading to the identification of 451 high-confidence targets. Through phylogenetic profiling against 228 pathogen genomes, shortlisted targets have been further explored to identify broad-spectrum antibiotic targets, while also identifying those specific to tuberculosis. Targets that address mycobacterial persistence and drug resistance mechanisms are also analysed.</p> <p>Conclusion</p> <p>The pipeline developed provides rational schema for drug target identification that are likely to have high rates of success, which is expected to save enormous amounts of money, resources and time in the drug discovery process. A thorough comparison with previously suggested targets in the literature demonstrates the usefulness of the integrated approach used in our study, highlighting the importance of systems-level analyses in particular. The method has the potential to be used as a general strategy for target identification and validation and hence significantly impact most drug discovery programmes.</p

    An automated framework for understanding structural variations in the binding grooves of MHC class II molecules

    Get PDF
    Background: MHC/HLA class II molecules are important components of the immune system and play a critical role in processes such as phagocytosis. Understanding peptide recognition properties of the hundreds of MHC class II alleles is essential to appreciate determinants of antigenicity and ultimately to predict epitopes. While there are several methods for epitope prediction, each differing in their success rates, there are no reports so far in the literature to systematically characterize the binding sites at the structural level and infer recognition profiles from them. Results: Here we report a new approach to compare the binding sites of MHC class II molecules using their three dimensional structures. We use a specifically tuned version of our recent algorithm, PocketMatch. We show that our methodology is useful for classification of MHC class II molecules based on similarities or differences among their binding sites. A new module has been used to define binding sites in MHC molecules. Comparison of binding sites of 103 MHC molecules, both at the whole groove and individual sub-pocket levels has been carried out, and their clustering patterns analyzed. While clusters largely agree with serotypic classification, deviations from it and several new insights are obtained from our study. We also present how differences in sub-pockets of molecules associated with a pair of autoimmune diseases, narcolepsy and rheumatoid arthritis, were captured by PocketMatch(13). Conclusion: The systematic framework for understanding structuralvariations in MHC class II molecules enables large scale comparison of binding grooves and sub-pockets, which is likely to have direct implications towards predicting epitopes and understanding peptide binding preferences

    PocketMatch: A new algorithm to compare binding sites in protein structures

    Get PDF
    Background: Recognizing similarities and deriving relationships among protein molecules is a fundamental&#xd;&#xa;requirement in present-day biology. Similarities can be present at various levels which can be detected through comparison of protein sequences or their structural folds. In some cases similarities obscure at these levels could be present merely in the substructures at their binding sites. Inferring functional similarities between protein molecules by comparing their binding sites is still largely exploratory and not as yet a routine protocol. One of&#xd;&#xa;the main reasons for this is the limitation in the choice of appropriate analytical tools that can compare binding sites with high sensitivity. To benefit from the enormous amount of structural data that is being rapidly accumulated, it is essential to have high throughput tools that enable large scale binding site comparison.&#xd;&#xa;&#xd;&#xa;Results: Here we present a new algorithm PocketMatch for comparison of binding sites in a frame invariant&#xd;&#xa;manner. Each binding site is represented by 90 lists of sorted distances capturing shape and chemical nature of the site. The sorted arrays are then aligned using an incremental alignment method and scored to obtain PMScores for pairs of sites. A comprehensive sensitivity analysis and an extensive validation of the algorithm have been carried out. Perturbation studies where the geometry of a given site was retained but the residue types were changed randomly, indicated that chance similarities were virtually non-existent. Our analysis also demonstrates that shape information alone is insufficient to discriminate between diverse binding sites, unless&#xd;&#xa;combined with chemical nature of amino acids.&#xd;&#xa;&#xd;&#xa;Conclusions: A new algorithm has been developed to compare binding sites in accurate, efficient and&#xd;&#xa;high-throughput manner. Though the representation used is conceptually simplistic, we demonstrate that along&#xd;&#xa;with the new alignment strategy used, it is sufficient to enable binding comparison with high sensitivity. Novel methodology has also been presented for validating the algorithm for accuracy and sensitivity with respect to geometry and chemical nature of the site. The method is also fast and takes about 1/250th second for one comparison on a single processor. A parallel version on BlueGene has also been implemented

    Structural Annotation of Mycobacterium tuberculosis Proteome

    Get PDF
    Of the ∼4000 ORFs identified through the genome sequence of Mycobacterium tuberculosis (TB) H37Rv, experimentally determined structures are available for 312. Since knowledge of protein structures is essential to obtain a high-resolution understanding of the underlying biology, we seek to obtain a structural annotation for the genome, using computational methods. Structural models were obtained and validated for ∼2877 ORFs, covering ∼70% of the genome. Functional annotation of each protein was based on fold-based functional assignments and a novel binding site based ligand association. New algorithms for binding site detection and genome scale binding site comparison at the structural level, recently reported from the laboratory, were utilized. Besides these, the annotation covers detection of various sequence and sub-structural motifs and quaternary structure predictions based on the corresponding templates. The study provides an opportunity to obtain a global perspective of the fold distribution in the genome. The annotation indicates that cellular metabolism can be achieved with only 219 folds. New insights about the folds that predominate in the genome, as well as the fold-combinations that make up multi-domain proteins are also obtained. 1728 binding pockets have been associated with ligands through binding site identification and sub-structure similarity analyses. The resource (http://proline.physics.iisc.ernet.in/Tbstructuralannotation), being one of the first to be based on structure-derived functional annotations at a genome scale, is expected to be useful for better understanding of TB and for application in drug discovery. The reported annotation pipeline is fairly generic and can be applied to other genomes as well

    PocketDepth:A new depth based algorithm for identification of ligand binding sites in proteins

    No full text
    Predicting functional sites in proteins is important in structural biology for understanding the function and also for structure-based drug design. Here we report a new binding site prediction method PocketDepth, which is geometry based and uses a depth based clustering.Depth is an important parameter considered during protein structure visualisation and analysis but has been used more often intuitively than systematically. Our current implementation of depth reflects how central a given subspace is to a putative pocket. We have tested the algorithm against PDBbind, a large curated set of 1091 proteins. A prediction was considered a true-positive if the predicted pocket had at least 10% overlap with the actual ligand. Two different parameter sets, ‘deeper’and ‘surface’ were used, for wider coverage of different types of binding sites in proteins. With deeper parameters, true-positives were observed for 841 proteins, resulting in a prediction accuracy of 77%,for any ranked prediction. Of these, 55.2% were first ranked predictions, whereas 91.2% and 97.4% were covered in the first 5 and 10 ranks, respectively. With the ‘surface’ parameters, a prediction rate of 95.8% was observed, albeit with much poorer ranks. The deeper set identified pocket boundaries more precisely and yielded better ranks, while the latter missed fewer predictions and hence had better coverage. The two parameter sets were therefore algorithmically combined, resulting in prediction accuracies of 96.5% for any ranked prediction. About 41.8% of these were in the first rank, 82% and 94% were in top 5 and 10 ranks, respectively

    PocketDepth: A new depth based algorithm for identification of ligand binding sites in proteins

    No full text
    Predicting functional sites in proteins is important in structural biology for understanding the function and also for structure-based drug design. Here we report a new binding site prediction method PocketDepth, which is geometry based and uses a depth based clustering. Depth is an important parameter considered during protein structure visualisation and analysis but has been used more often intuitively than systematically. Our current implementation of depth reflects how central a given subspace is to a putative pocket. We have tested the algorithm against PDBbind, a large curated set of 1091 proteins. A prediction was considered a true-positive if the predicted pocket had at least 10% overlap with the actual ligand. Two different parameter sets, ‘deeper’ and ‘surface’ were used, for wider coverage of different types of binding sites in proteins. With deeper parameters, true-positives were observed for 841 proteins, resulting in a prediction accuracy of 77%, for any ranked prediction. Of these, 55.2% were first ranked predictions, whereas 91.2% and 97.4% were covered in the first 5 and 10 ranks, respectively. With the ‘surface’ parameters, a prediction rate of 95.8% was observed, albeit with much poorer ranks. The deeper set identified pocket boundaries more precisely and yielded better ranks, while the latter missed fewer predictions and hence had better coverage. The two parameter sets were therefore algorithmically combined, resulting in prediction accuracies of 96.5% for any ranked prediction. About 41.8% of these were in the first rank, 82% and 94% were in top 5 and 10 ranks, respectively. The algorithm is available a

    PocketDepth: A new depth based algorithm for identification of ligand binding sites in proteins

    No full text
    Predicting functional sites in proteins is important in structural biology for understanding the function and also for structure-based drug design. Here we report a new binding site prediction method PocketDepth, which is geometry based and uses a depth based clustering. Depth is an important parameter considered during protein structure visualisation and analysis but has been used more often intuitively than systematically. Our current implementation of depth reflects how central a given subspace is to a putative pocket. We have tested the algorithm against PDBbind, a large curated set of 1091 proteins. A prediction was considered a true-positive if the predicted pocket had at least 10% overlap with the actual ligand. Two different parameter sets, ‘deeper’ and ‘surface’ were used, for wider coverage of different types of binding sites in proteins. With deeper parameters, true-positives were observed for 841 proteins, resulting in a prediction accuracy of 77%, for any ranked prediction. Of these, 55.2% were first ranked predictions, whereas 91.2% and 97.4% were covered in the first 5 and 10 ranks, respectively. With the ‘surface’ parameters, a prediction rate of 95.8% was observed, albeit with much poorer ranks. The deeper set identified pocket boundaries more precisely and yielded better ranks, while the latter missed fewer predictions and hence had better coverage. The two parameter sets were therefore algorithmically combined, resulting in prediction accuracies of 96.5% for any ranked prediction. About 41.8% of these were in the first rank, 82% and 94% were in top 5 and 10 ranks, respectively. The algorithm is available at http://proline.physics.iisc.ernet.in/pocketdepth

    PocketMatch: A new algorithm to compare binding sites in protein structures

    No full text
    corecore